Skip to content

Modernize SIP string handling#1645

Merged
sipsorcery merged 1 commit into
sipsorcery-org:masterfrom
paulomorgado:strings-4
May 31, 2026
Merged

Modernize SIP string handling#1645
sipsorcery merged 1 commit into
sipsorcery-org:masterfrom
paulomorgado:strings-4

Conversation

@paulomorgado

Copy link
Copy Markdown
Contributor

Update SIP and user-agent code to use clearer interpolation, span/string helpers, and explicit string comparison semantics. Add Polyfill support required by the new string APIs and remove the duplicate KeyValuePair Deconstruct extension.

Split of #1639

Comment thread src/SIPSorcery/core/SIP/SIPConstants.cs Outdated
@paulomorgado paulomorgado force-pushed the strings-4 branch 2 times, most recently from 98d5b86 to 4d3b642 Compare May 29, 2026 16:37
@sipsorcery

Copy link
Copy Markdown
Member

This PR did introduce a few new bugs in the SIP logic. One would have broken any SIP request that used the Route or Record-Route headers. The unit tests that show the bugs have been committed to master and if you rebase this PR the new tests will fail.

Overall the improvements do generally modernise the code but the size of PRs like this is unmanageable. The concerns need to be done separately, i.e. if it's jsut to update string formatting that should be one PR. Any functional changes should be done in a different PR. It tkaes mroe work to review PRs of this size than it does to write them.

@paulomorgado

Copy link
Copy Markdown
Contributor Author

The issue was the ToString() method declared as public new String() instead of public override String().

I had already run into 2 of these that were covered by tests.

And now I found there are other 6. Why?

@sipsorcery

Copy link
Copy Markdown
Member

And now I found there are other 6. Why?

Just the way I did it at the time, it was ~20 years ago! It doesn't matter so much as to the "why" as to the fact that the current behaviour works. Adding breaking "optimisations" would be a BIG problem for library users.

@paulomorgado paulomorgado marked this pull request as draft May 30, 2026 20:53
@sipsorcery

Copy link
Copy Markdown
Member

Another regression detected. Unit tests added to master see here.

Modernize SIP message parsing and formatting for performance:
- Replace string concat/Regex with Span-based parsing
- Use StringBuilder and interpolated strings in ToString()
- Improve header/parameter parsing, folded header handling
- Update code style, error messages, and logging
- Add Polyfill, set C# 14, ensure .NET 4.6.2 compatibility
- Remove obsolete code, update project metadata and docs
@paulomorgado

Copy link
Copy Markdown
Contributor Author

Another regression detected. Unit tests added to master see here.

All tests are passing now.

@paulomorgado

Copy link
Copy Markdown
Contributor Author
using System;

var c = new C();
Console.WriteLine(c.ToString());
Console.WriteLine(c);

class C
{
    public new string ToString()
    {
        return "Hello, World!";
    }
}
$ dotnet run app.cs
Hello, World!
C

@paulomorgado

Copy link
Copy Markdown
Contributor Author

Benchmark Analysis — sipsorcery-org/sipsorcery PR #1645

"Modernize SIP string handling"

Environment: BenchmarkDotNet v0.15.8, Windows 11, Intel Core i9-13900K,
.NET 10.0.8 (RyuJIT x86-64-v3), Job=ShortRun (3 iterations, 3 warmup)


1. ParseSIPExtensions

Results

Scenario Old (ns) New (ns) Speedup Alloc Old Alloc New Alloc Δ
Single known (100rel) 38.6 17.2 2.2× 104 B 72 B −31%
3 known extensions (100rel, norefersub, replaces) 92.1 37.1 2.5× 336 B 72 B −79%
3 known, upper-case input (100REL, REPLACES, NOREFERSUB) 119.1 34.3 3.5× 464 B 72 B −85%
Mixed known + unknown (100rel, x-unknown, multiple-refer) 107.4 39.9 2.7× 400 B 112 B −72%
Two unknowns (x-foo, x-bar) 78.2 43.7 1.8× 296 B 184 B −38%
Single unknown only (x-unknown-ext) 34.2 16.1 2.1× 64 B 80 B +25% ⚠️

What changed

Old implementation:

string[] extensions = extensionList.Trim().Split(',');
foreach (string extension in extensions)
{
    string trimmedExtension = extension.Trim().ToLower();
    switch (trimmedExtension) { ... }
    // unknowns collected via string concatenation (+=)
}
  • Trim().Split(',') allocates a string[] array plus a new string per token
  • ToLower() allocates a new lowercased string per token regardless of case
  • Unknown extensions collected via += string concatenation (further allocations per unknown)

New implementation:

var extensions = extensionList.AsSpan().Trim();
foreach (var extensionRange in extensions.Split(','))
{
    var trimmedExtension = extensions[extensionRange].Trim(); // stack-only slice
    switch (trimmedExtension) { ... }  // OrdinalIgnoreCase span comparison — no ToLower
}
// firstUnknown held as ReadOnlySpan<char>, ToString() deferred until end
// 2+ unknowns collected via StringBuilder.Append(ReadOnlySpan<char>)
  • AsSpan().Trim() is stack-only — zero heap allocations for splitting
  • OrdinalIgnoreCase span comparisons replace ToLower() — no per-token string allocation for known extensions
  • firstUnknown stored as ReadOnlySpan<char> and only materialised via .ToString() at the very end, and only if it is the sole unknown
  • 2+ unknowns collected in a StringBuilder via Append(ReadOnlySpan<char>) — no intermediate strings per token, replaces List<string> + string.Join

Why known-extension cases improve so dramatically

Every known token (e.g. 100rel, replaces, norefersub) never touches the heap at
all in the new path — the OrdinalIgnoreCase span comparison is done entirely on the
stack. The old path allocated one ToLower() string per token regardless of whether
the token was known or unknown.

Upper-case input (100REL, REPLACES, NOREFERSUB) shows the largest gain (3.5×
faster, −85% allocations
) because ToLower was doing real character-by-character
work and creating a full new string for every token.

The single-unknown +16 B explained

Both paths allocate new List<SIPExtensions>() (32 B) — this cancels out.
The delta comes entirely from what each path allocates for the unknown token itself.

Old path allocations for "x-unknown-ext"

Object Bytes Reason
string[1] from .Split(',') 32 B Array object; no comma found so .NET returns new[]{ this } — the single element is a pointer to the already-existing input string, no new string object is created for the token itself
.ToLower() on "x-unknown-ext" 0 B The string is already all-lowercase ASCII — .NET's ChangeCaseCommon scans for a character that needs changing, finds none, and returns the same instance
unknownExtensions = extension.Trim() 0 B No leading/trailing whitespace — Trim() returns the same instance

Total: 32 B (List) + 32 B (array) = 64 B

New path allocations for "x-unknown-ext"

Object Bytes Reason
firstUnknown.ToString() 48 B Must create a brand-new heap string: 8 (pre-header) + 8 (MT pointer) + 4 (length field) + 26 (13 chars × 2 bytes) + 2 (null terminator) = 48 B

Total: 32 B (List) + 48 B (string) = 80 B

Visualised

Old:  [ string[1] array: 32 B ] ──points to──▶ [ "x-unknown-ext" (pre-allocated input, 0 B extra) ]

New:  (no array: 0 B)           ──creates──▶   [ "x-unknown-ext" (new heap string, 48 B)          ]

Δ = 48 B − 32 B = +16 B

The old code's Split allocates a pointer array (32 B) whose single element
references the already-existing input string — no new string object is allocated for
the token itself. The new span-based code eliminates the array entirely, but
span.ToString() must materialise a brand-new string object (48 B) on the managed heap.

This gap cannot be closed without changing the method's out string unknownExtensions
contract to out ReadOnlySpan<char> — a breaking API change. The 2.1× time
improvement
makes this a net win despite the marginal allocation increase.


2. ParseCustomHeaders

Results

Scenario Old (Regex, ns) New (Span, ns) Speedup Alloc Old Alloc New
Via (match) 86.9 0.9 94× faster 248 B 0 B
Content-Length (match) 234.4 3.0 77× faster 248 B 0 B
Max-Forwards (trim + match) 193.7 8.2 24× faster 296 B 0 B
X-Custom-Header (no match) 83.4 2.7 31× faster 0 B 0 B
X-Sip-Callid: value (no match) 90.5 2.6 35× faster 0 B 0 B

What changed

Old implementation:

Regex.Match(header, SIP_CUSTOM_HEADER_PREFIX, RegexOptions.IgnoreCase)

New implementation:

header.AsSpan().Trim().StartsWith(..., StringComparison.OrdinalIgnoreCase)

Key findings

  • Replacing Regex.Match() with ReadOnlySpan<char> comparisons delivers a
    24–94× speedup across all inputs.
  • AsSpan().Trim() is fully stack-based — zero heap allocations in every scenario,
    compared to the Match object the regex engine allocated on every call.
  • Both matching and non-matching paths benefit equally; regex engine initialisation
    dominates the cost regardless of whether the pattern is ultimately found.
  • The Max-Forwards case (leading/trailing whitespace) shows the lowest speedup
    (24×) because AsSpan().Trim() adds a small scanning cost — still 24× faster and
    still allocates nothing.
  • Content-Length shows the highest variance in the old path (error: 280 ns) due to
    regex JIT compilation occasionally being triggered during the short run.

Summary

Change Best case Worst case Allocations
ParseSIPExtensions 3.5× faster 2.1× faster −85% to −31%; one structural edge case: +16 B (explained above)
ParseCustomHeaders 94× faster 24× faster 0 B in all cases (was 248–296 B)

@paulomorgado paulomorgado marked this pull request as ready for review May 31, 2026 11:33
@sipsorcery sipsorcery merged commit ef1e6e8 into sipsorcery-org:master May 31, 2026
7 checks passed
@paulomorgado paulomorgado deleted the strings-4 branch May 31, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants